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Abstract 


A  metamodel  is  said  to  be  embedded  within  a  simulation  if  it  is  used  to  replace  a 
submodule  of  that  simulation.  Replacing  a  deterministic  module  with  an  embedded 
deterministic  metamodel  poses  no  apparent  mathematical  problems.  However,  using  a 
deterministic  metamodel  to  replace  a  stochastic  simulation  component  could  require 
additional  corrective  actions. 

Intuitively,  replacing  a  stochastic  simulation  component  with  a  deterministic 
polynomial  metamodel  should  reduce  the  variance  of  the  ‘parent’  simulation.  If  a 
simulation  component  that  exhibits  variation  in  its  output  for  a  given  set  of  inputs  is 
replaced  by  a  deterministic  equation  with  no  comparable  variation  in  its  output,  the 
variance  of  the  ‘parent’  simulation  could  be  reduced  inappropriately. 

This  research  investigated  the  effects  of  metamodel  substitution  in  two  phases. 

The  first  case  dealt  with  a  set  of  tandem  queues.  It  was  shown  that  as  each  queue  was 
sequentially  replaced  with  a  metamodel,  the  total  system  variance  was  inappropriately 
diminished.  A  theoretical  model  of  the  error  components  was  postulated  and  used  to 
compensate  for  this  missing  variation,  restoring  the  parent  simulation’s  variance  to 
approximately  its  original  level.  In  the  second  phase,  the  problem  was  extended  to  the 
case  of  repeatedly  sampling  an  embedded  metamodel.  Again,  as  the  theoretical  model  of 
the  error  components  predicted,  the  metamodel  substitution  inappropriately  diminished  the 
variance  of  the  parent  simulation  in  scenarios  where  there  was  at  least  a  moderate  degree 
of  variance  in  the  submodule  that  was  replaced.  The  diminished  variance  was  again 
compensated  for  by  appealing  to  the  theoretical  model  of  error  components  introduced  in 
the  first  phase. 

In  addition,  guidelines  for  metamodel  use  were  presented.  In  some  situations, 
sampling  from  a  probability  distribution  is  more  appropriate  than  the  use  of  metamodel. 
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THE  EFFECT  OF  REPEATEDLY  SAMPLING  AN  EMBEDDED 
METAMODEL  ON  THE  SIMULATION  RESPONSE 


I.  Introduction 


Background 

Computer  simulation,  in  its  broadest  sense,  is  the  process  of  designing  a 
mathematical-logical  model  of  a  real  system  and  experimenting  with  this  model  on  a 
computer  (Pritsker,  1986:6).  In  this  sense,  the  computer  is  basically  an  extension  of  the 
analytical  approach  to  reality  (Patterson,  1972:60).  As  analysts  have  tried  to  model  larger 
and  more  complex  problems,  computer  simulation  models  have  grown  in  size  and 
complexity.  The  growing  size  and  complexity  of  simulation  models  has  led  researchers  to 
improve  methods  for  organizing,  developing,  and  accelerating  the  run  times  of  simulation 
models. 

One  method  of  organizing  the  growing  complexity  of  computer  simulation  models 
is  the  use  of  hierarchical  modeling.  In  general,  hierarchical  modeling  provides  a  means  for 
managing  system  complexity  by  partitioning  the  system  into  logical,  usable  chunks  which 
can  then  be  independently  manipulated  (created,  changed,  extended)  (Luna,  1993:132). 

There  are  several  reasons  that  [a]  hierarchical  modeling  capability  is 
desirable.  These  include  modeling  ease  (e.g.,  reducing  the  time  and  effort  required 
to  develop  models),  allowing  for  model  reuse  (a  topic  of  long  standing  interest . .), 
reducing  the  number  of  specific  models  required,  allowing  for  the  use  of  a  data 
base  of  models . . . ,  and  aiding  in  model  validation. . ,  (Sargent,  1993:569). 

A  rather  straightforward  approach  to  hierarchical  modeling  is  to  decompose  a  model  into 

‘connected  submodels.’  (Sargent,  1993:569).  A  second  approach,  is  to  replace  one  or 

more  of  the  submodels  with  a  corresponding  empirical  approximation  or  metamodel 

(Sargent,  1993:569).  These  metamodels  are  the  main  focus  of  this  thesis. 
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Whereas  a  computer  simulation  is  a  model  of  reality,  a  metamodel  is  a  model  of  a 
computer  simulation,  and  is  thus  a  model  of  a  model;  hence,  the  name  metamodel. 
Metamodels  are  often  linear  regression  models.  WhUe  this  thesis  deals  exclusively  with 
least  squares  regression  metamodels;  piecewise  linear  models,  splines,  inverse 
polynomials,  and  Fourier  transformations  may  also  serve  as  metamodels  (Kleijnen, 
1987:149),  Other  metamodel  forms  include  kernel  smoothing,  radial  basis  functions, 
spatial  correlation  models,  wavelets,  and  neural  networks  (Barton,  1993:12)  Regardless 
of  its  type,  a  metamodel  relates  the  output  of  a  simulation  to  its  input  factors.  A 
metamodel,  then,  is  an  empirical  model  based  on  an  underlying  causal  (or  mechanistic) 
simulation  model. 

Metamodels  are  used  for  validation,  estimation  of  factor  interactions,  control, 
optimization,  and  so  on  (Kleijnen,  1987:149).  The  primary  use  of  the  metamodel  is  to 
gain  insight  into  the  simulation  from  which  the  metamodel  was  built  or  created. 

The  use  of  a  metamodel  enables  one  to  interpret  the  simulated  system  more 
easily  and  more  fully,  especially  with  regard  to  performing  sensitivity  analysis, 
evaluating  the  effect  of  specific  values  of  the  input  variables  on  the  response 
measurements,  and  answering  the  ‘what-if  question  without  the  need  for 
additional  runs.  After  all,  once  the  metamodel  has  been  developed  and  validated, 
further  investigation  of  the  real  system  using  this  metamodel  is  simpler  and  less 
costly  than  conducting  additional  simulation  experiments.  In  addition,  the 
coefficients  calculated  for  the  regression  metamodel  may  give  the  researcher  a 
better  understanding  of  the  relationships  between  the  input  variables  and  the 
response  measurement  of  interest  (Friedman,  1985:144). 


Since  a  fully  developed  and  validated  metamodel  eliminates  the  need  for  further  simulation 
runs,  the  ensuing  analyses  of  the  real-world  system  would  generally  focus  on  the 
metamodel  almost  exclusively  (Ghosh,  1988:70). 

Another  possible  use  of  metamodels  arises,  as  Kleijnen  stated,  when,  there  is  a 
‘parent’  simulation  that  uses  the  output  of  a  ‘lower-level  component’  simulation.  An 
example  might  occur  in  a  theater-level  military  simulation.  A  larger  theater-level 
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simulation  might  call  on  a  lower-level  component  simulation  to  assist  in  resolving  a  ‘one- 
on-one’  conflict  between  two  aircraft.  The  lower-level  component  simulation  would 
determine  the  probability  of  a  ‘kill’,  which  the  parent  simulation  would  then  use  to 
determine  if  a  ‘kill’  occurred.  It  might  be  possible  to  replace  this  lower-level  component 
simulation  with  a  metamodel.  When  a  parent  simulation  uses  the  ou^ut  of  a  metamodel 
contained  within  it,  the  metamodel  is  said  to  be  embedded  within  the  parent  simulation. 
When  the  parent  simulation  repeatedly  uses  the  output  of  the  embedded  metamodel,  it  is 
said  to  repeatedly  sample  from  this  embedded  metamodel.  There  are  several  aspects 
regarding  this  process  of  replacing  a  lower-level  component  simulation  with  a  metamodel 
that  should  be  considered. 

One  of  these  considerations  is  the  difference  between  a  stochastic  simulation  model 
and  a  metamodel.  A  computer  simulation  is  random  or  stochastic  when  a  random  number 
generator  is  used  to  simulate  the  random  nature  of  the  system  or  process  under 
investigation.  In  contrast,  a  least-squares  regression  function  is  deterministic;  for  a  given 
set  of  inputs,  a  least  squares  regression  metamodel  always  yields  the  same  result.  Like  a 
least  squares  regression  function,  a  deterministic  simulation  always  produces  the  same 
response  for  a  given  set  of  inputs.  Deterministic  simulations  may  be  viewed  as  a  special 
case  of  random  simulations;  given  a  specified  set  of  input  parameters,  the  output  assumes 
a  specific  value  with  probability  equal  to  one  (Kleijnen,  1987:148). 

Thus,  if  we  simply  replace  a  lower-level  component  in  a  simulation  with  a  least- 
squares  metamodel,  we  would  expect  the  output  variance  of  the  parent  simulation  to 
diminish.  If  there  is  a  reduction  in  output  variability  associated  with  substituting  a 
metamodel  for  a  lower-level  component  simulation,  can  this  lack  of  variability  then  be 
replaced?  The  currently  available  literature  concerning  metamodels  has  yet  to  address  this 
question. 
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Problem  Statement 

There  has  been  little  research  on  the  effect  of  embedded  metamodels  on  the 
variability  of  simulation  responses.  Furthermore,  there  are  no  published  guidelines  that 
indicate  when  it  is  appropriate  to  substitute  a  metamodel  for  a  ‘lower-level  component’ 
simulation. 

Objective 

The  purpose  of  this  work  is  to  1.)  provide  insight  into  how  embedded  metamodels 
affect  the  output  of  a  parent  simulation,  particularly  with  regard  to  repeatedly  sampling  an 
embedded  metamodel,  and  2.)  suggest  guidelines  for  their  use. 

Overview 

Chapter  n  provides  some  background  on  metamodels,  a  survey  of  relevant 
research  issues  regarding  embedded  metamodels,  as  well  as  some  examples  of  metamodel 
applications.  As  alluded  to  earlier,  this  body  of  literature  offers  little  guidance  regarding 
embedded  metamodels.  In  Chapter  HI  the  issue  of  substituting  a  deterministic  metamodel 
for  a  stochastic  simulation  component  is  explored  via  a  simulation  model  of  tandem 
queues.  This  process  is  then  generalized  in  Chapter  FV  where  the  issue  of  repeatedly 
sampling  an  embedded  metamodel  is  developed.  Having  discussed  methods  for 
metamodel  usage  in  Chapters  HI  and  IV,  Chapter  V  discusses  potential  pitfalls  of 
inappropriate  metamodel  uses.  Finally,  Chapter  VI  closes  this  thesis  with  some 
conclusions  and  recommendations. 
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II.  Background 


Overview 

Metamodels  have  been  used  by  the  simulation  community  for  almost  twenty  years 

(Barton,  1993:12).  Despite  impressive  advances  in  computing  and  simulation  capabilities, 

metamodels  continue  to  be  effective  representations  of  input-output  relationships  due  to 

their  ease  of  use  and  straight-forward  application. 

The  value  of  metamodels  lies  in  their  simplicity.  A  simple  model  is  easy  to 
understand,  and  so  it  is  possible  to  gain  insight  from  a  metamodel  of  a  complex 
situation  whose  structure  defies  obvious  insight  into  its  behavior.  A  simple  model 
is  also  a  fast  running  model,  and  so  simulation  metamodels  can  be  used  for 
interactive  “what  it?”  studies  in  place  of  the  time  consuming  original  code  (Barton. 
1993:12). 

Metamodels  may  be  represented  as  piecewise  linear  models,  splines,  inverse 
polynomials,  Fourier  transformations,  kernel  smoothing,  radial  basis  functions,  spatial 
correlation  models,  wavelets,  or  neural  networks.  However,  they  often  are  polynomial 
regression  metamodels  of  the  form  (Kleijnen,  1987:149) 

y  =  g(x)  +  e  (2.1) 

where  y  is  the  actual  value,  g(x)  is  the  true  metamodel,  and  e  is  the  error.  In  practice,  the 
true  metamodel  is  rarely  known  and  must  be  estimated  by 

g(x)  =  fix)  =  y  (2.2) 

where  f(x)  is  the  polynomial  approximation  of  g(x)  and  y  is  the  metamodel  estimate  of  the 

actual  value  of  y.  For  reasons  of  practicality  and  interpretability,  polynomial  metamodels 
frequently  are  limited  to  first  and  second  order  linear  models  (Kleijnen,  1987:200). 
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Research  Issues 


Most  of  the  research  pertaining  to  metamodels  has  focused  on  the  technical  details 
of  actually  developing  and  exploiting  their  many  possible  forms.  Other  research  related  to 
developing  metamodels  includes  analysis  of  error  components  (Tew,  1994:7)  and 
alternatives  to  polynomial  regressions  (Barton,  1992:289;  Barton,  1993:12). 

Developing  and  interpreting  metamodels  raises  a  host  of  issues. 

The  major  issues  in  metamodeling  include:  i)  the  choice  of  a  functional  form  for  f, 
ii)  the  design  of  experiments,  i.e.,  the  selection  of  a  set  of  x  points  at  which  to 
observe  y  (run  the  full  model)  to  adjust  the  fit  of  f  to  g,  the  assignment  of  random 
number  streams,  the  length  of  runs,  etc.,  and  iii)  the  assessment  of  the  adequacy  of 
the  fitted  metamodel  (confidence  intervals,  hypothesis  tests,  lack  of  fit  and  other 
diagnostics).  The  functional  form  will  generally  be  described  as  a  linear 
combination  of  basis  functions  from  a  parametric  family.  So  there  are  choices  for 
families  (e.g.,  polynomials,  sine  functions,  piecewise  polynomials,  wavelets,  etc.) 
and  choices  for  the  way  to  pick  the  ‘best’  representation  from  within  the  family 
(e.g.  least  squares,  maximum  likelihood,  cross  validation,  etc.).  The  issues  of 
experimental  design  and  metamodel  assessment  are  related  since  the  selection  of  an 
experimental  design  will  be  determined  in  part  by  its  effects  on  assessment  issues 
(Barton,  1992:290). 

The  experimental  design  used  to  create  the  metamodel  has  also  been  an  area  of 
research.  Response  Surface  Methodology  (RSM)  techniques  have  been  the  traditional 
method  for  gradient  information  and  sensitivity  analysis  (Wilson,  1987:378). 

Response  surface  methodology  comprises  a  group  of  statistical  techniques  for 
empirical  model  building  and  model  exploitation.  By  careful  design  and  analysis  of 
experiments,  it  seeks  to  relate  a  response,  or  output  variable  to  the  levels  of  a 
number  of  predictors,  or  input  variables,  that  affect  it  (Box,  1987:1). 

The  reader  is  referred  to  Box  and  Draper  (1987)  for  a  more  complete  discussion  of  RSM. 

However,  as  research  into  RSM  for  gradient  information  and  sensitivity  analysis  has 

diminished,  alternatives  have  been  investigated.  These  alternatives  include  frequency- 

domain  methods,  perturbation  analysis,  and  likelihood-ratio  methods  (Wilson,  1987:378). 

These  alternatives  represent  areas  of  research  into  other  methods  of  model  development, 

other  than  the  traditional  methods  of  RSM.  In  comparison  to  these  alternatives  for 
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gradient  information  and  sensitivity  analysis,  the  mathematical  and  statistical  foundations 
of  RSM  are  more  transparent  and  more  developed  (Wilson,  1987:378). 

Clearly,  while  researchers  and  practitioners  have  focused  on  metamodel 
development,  design,  applications,  and  forms,  the  current  literature  also  indicates  that 
metamodels  continue  to  be  the  object  of  ongoing  research.  Sargent’s  summary  of  1 1 
different  metamodel  research  issues  included: 

1.  What  type  of  Goodness  of  Fit  criterion  should  be  used  for  the  metamodel? 

2.  What  are  the  trade-offs  between  simplicity  and  complexity  (using,  e.g., 
variance  reduction  techniques)  in  designs? 

3.  What  is  the  required  level  of  accuracy  for  the  metamodel? 

4.  How  to  determine  the  validity  of  the  metamodel  with  respect  to  the  real 
system?  (Sargent,  1991:892). 

While  there  have  been  references  to  the  possibility  of  embedding  metamodels  within  a 
simulation,  little  progress  has  been  made  in  this  area  (Sargent,  1991). 

Metamodel  Uses 

The  use  of  metamodels  to  provide  insight  into  more  complex  systems  has  been  the 
most  common  application  of  metamodels  (Ghosh,  1988:70).  Specific  examples  of  using 
metamodels  to  provide  insight  into  complicated  systems  include  Ghosh’s  evaluation  of  a 
computer’s  multiprocessor  performance  and  Kleijnen’s  sensitivity  analysis  of  the 
greenhouse  effect  (Kleijnen,  1990;  Ghosh,  1988).  Ghosh’s  research  was  significant  in  that 
it  was  the  first  to  fully  explore  the  methodology  issues  regarding  metamodel  research  of 
multiprocessor  performance  (Ghosh,  1988:70).  Kleijnen’s  sensitivity  analysis  revealed  the 
importance  of  key  model  inputs  affecting  the  greenhouse  effect,  like  the  ocean,  which 
unexpectedly  was  quadratic  in  nature,  and  provided  insight  into  a  ‘bug’  in  the  dike-raising 
module  of  the  simulation  (Kleijnen,  1990:17).  These  results  are  typical  of  metamodel 
applications  in  that  significant  effects  are  determined  and  errors  in  model  formulation  are 
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revealed.  For  additional  examples,  the  reader  is  referred  to  the  extensive  list  of 
documented  metamodel  applications,  forms,  and  uses  contained  in  (Kleijnen,  1987). 

Embedded  Metamodels 

Sargent  elaborated  on  the  concept  of  embedded  metamodels  in  the  context  of 
hierarchical  modeling.  He  suggested  the  use  of  metamodels  to  replace  one  or  more 
‘lower-level  components’  in  a  parent  simulation.  He  added,  “This  approach  needs 
considerable  development  prior  to  becoming  feasible”  (Sargent,  1993:570).  Sargent  did 
not  list  the  specifics  of  the  required  development,  but  they  would  presumably  include  such 
issues  as  the  inappropriate  reduction  in  the  variability  of  the  simulation  output  and  the 
methods  needed  to  restore  it. 

Central  to  the  issue  of  inappropriately  reducing  the  variance  of  a  simulation  model 
is  the  potential  practice  of  replacing  a  stochastic  simulation  component  with  a 
deterministic  polynomial  expression.  This  area  of  research  appears  to  be  unexplored. 
There  has  been  some  work  in  replacing  ‘lower-level  components’  in  deterministic 
simulations  with  metamodels.  In  fact,  Kleijnen  used  a  metamodel  to  serve  as  a  ‘lower- 
level  component’  of  a  larger  deterministic  simulation  (Kleijnen,  1990:12).  Kleijnen’s 
application  of  replacing  a  deterministic  simulation  component  with  a  deterministic 
metamodel  was  completely  appropriate  and  consistent.  The  problems  with  metamodel 
substitutions  originate  when  a  deterministic  metamodel  replaces  a  stochastic  simulation 
component  These  problems  revolve  around  identifying  and  possibly  replacing  the 
variability  present  in  the  simulation  without  the  embedded  metamodel,  but  diminished  in 
the  simulation  that  contains  an  embedded  metamodel.  The  currently  available  literature 
does  not  address  this  aspect  of  metamodel  use. 
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III.  Metamodel  Effects  on  Variation 


Introduction 

Replacing  a  stochastic  simulation  component  with  a  deterministic  embedded 
metamodel  will  probably  decrease  the  overall  variation  of  the  ‘parent’  simulation.  This 
conjecture  was  tested  using  a  simulation  of  tandem  queues.  By  comparing  the  output 
variance  of  the  parent  simulation  with  that  of  a  simulation  containing  one  or  more 
metamodels,  the  effect  of  metamodel  substitution  on  simulation  output  was  examined. 

Components  of  Variation 

A  number  of  components  or  random  events  generated  within  a  simulation  affect 
the  mean  and  the  variance  its  response.  In  a  stochastic  simulation,  several  of  these 
components  will  have  their  own  sources  of  variability.  The  variance  of  a  simulation 
response,  then,  can  be  conceptualized  as  a  function  of  the  variances  of  each  of  its 
components.  Although  a  simulation  might  actually  contain  a  large  number  of  components, 
the  conceptual  model  of  its  variance  could  be  limited  to  those  components  that  have  an 
identifiable  and  quantifiable  effect  on  either  the  mean  or  the  variance  of  the  simulation 
output.  These  individual  components,  or  elements,  could  reasonably  be  restricted  to  the 
major  subroutines  or  processes  contained  in  the  simulation.  For  n  such  elements,  we 
could  express  the  mean  fij,  and  variance  of  the  simulation  as 

\Ls  =  \ii  +  ...  +  (3.1) 

and 

aj  =  af  +  ...  +  aj  (3.2) 

respectively,  where  p.i  and  represent  the  mean  and  variance  of  the  Jth  component  of  the 
simulation.  If  one  of  these  components  were  deterministic  in  nature,  its  mean,  given  a 
fixed  set  of  inputs,  would  be  constant,  and  therefore  its  variance  about  that  mean  would 
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equal  zero.  Equations  3.1  and  3.2  contain  some  implicit  assumptions.  The  first 
assumption  is  that  the  components  which  have  these  variances  are  independent,  and 
secondly,  that  their  variances  are  indeed  additive.  Depending  on  the  actual  simulation, 
these  assumptions  may  or  may  not  be  true. 

Equations  3.1  and  3.2,  also,  provide  a  mechanism  for  assessing  the  effect  of 
replacing  a  major  simulation  component  with  a  metamodel.  If  a  component  that  is 
stochastic  in  nature  is  replaced  by  a  deterministic  function,  the  variance  of  the  simulation 
response  could  be  artificially  diminished. 

Central  to  examining  the  appropriateness  of  replacing  a  stochastic  component  of  a 
simulation  with  a  deterministic  component  is  the  concept  of  relative  variance.  Relative 
variance  is  defined  as: 


Relative  Variance  = 


(3.3) 


where  cTc^  represents  the  variance  of  an  individual  component  or  the  sum  of  the  variances 
of  a  set  of  components  and  represents  the  variance  of  the  parent  simulation.  The 
relative  variance  can  be  small  either  because  the  components  simply  have  low  variations 
intrinsically,  or  because  there  are  so  many  sources  of  variation  within  the  simulation  that 
the  components’  relative  contributions  to  the  output  variance  is  small.  Using  Equation 
3.3,  Figure  3-1  illustrates  the  effect  on  (the  variance  of  the  parent  simulation)  of 
substituting  either  1, 2, 5,  or  10  metamodels  into  simulations  consisting  of  up  to  100 
components  with  equal  variance,  In  regions,  such  as  the  upper  right  portion  of  Figure 
3-1,  where  the  replaced  components  contribute  negligibly  to  the  simulation  output 
variance,  the  metamodel  substitution  would  be  expected  to  have  little  impact.  In  the  lower 
left  portion  of  Figure  3-1,  where  the  replaced  components  accounted  for  much  of  the 
output  variance,  the  metamodel  substitution  would  likely  result  in  a  major  reduction  in 
variance. 
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The  presentation  in  Figure  3-1  can  be  generalized  by  partitioning  the  components 
of  variance  into  two  terms:  Cm,  the  components  of  variance  corresponding  to  those 
elements  replaced  by  one  or  more  metamodels,  and  the  components  of  variance  for 
the  remaining  elements  not  replaced  by  metamodels.  The  total  variance  in  the  simulation 
is  simply  given  by  aJ+Or^.  The  ratio  aJ/iaJ+Gt^)  represents  the  proportion  of  the 
output  variance  lost  due  to  one  or  more  metamodel  substitutions,  while  CTr  /(Cm  +CTr ) 
indicates  the  proportion  of  the  output  variance  remaining.  It  is  evident  that  these  two 
ratios  will  always  sum  to  one.  The  line  in  Figure  3-2  depicts  the  theoretical  relationship 
between  these  two  ratios. 


Figure  3- 1  Relative  Percent  of  Variation  Remaining 
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Figure  3-2  Percent  of  Simulation  Replaced  by  Metamodels 


Figures  3-1  and  3-2  indicate  that  substituting  a  deterministic  function  for  a  stochastic 
simulation  component  would  inappropriately  reduce  the  output  variance  of  the  parent 
simulation.  Although  not  depicted  in  either  figure,  the  mean  of  the  simulation  output 
would  probably  not  be  significantly  affected  by  metamodel  substitutions,  assuming  that  the 
metamodel  adequately  estimates  the  mean.  Since  a  properly  validated  metamodel  provides 
an  approximately  unbiased  estimate  of  the  mean,  the  overall  system  mean  should  be  largely 
unaffected.  To  test  these  conjectures  an  experiment  was  performed. 

Experiment  Background 

Because  of  its  convenient  mathematical  properties,  a  simulation  of  tandem  queues 
consisting  of  5  M/M/1  queues  was  performed.  The  performance  measure  of  interest  in  the 
simulation  was  queue  length.  Since  each  M/M/1  queue  in  the  system  is  independent,  the 
total  number  of  entities  waiting  in  the  system  is  simply  the  sum  of  the  individual  queue 
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lengths  (Ross,  1993:374).  If  Ls  represents  the  total  queue  length  for  the  tandem  queuing 
system  and  Li,  i  =  1,  2, ....  5,  represents  the  queue  length  for  each  of  the  individual 
M/M/1  queues,  then 

Similarly,  the  variance  of  the  system  queue  length  is  the  sum  of  the  variances  for  each  of 
the  queue  lengths.  Thus,  in  this  situation,  the  simulation  output  is  indeed  additive. 

Using  this  simulation,  the  effect  of  replacing  simulation  components  with 
metamodels  was  examined.  As  one  or  more  of  the  queues  were  replaced  by  a  metamodel, 
the  effect  on  the  mean  and  variance  of  the  system  could  be  compared  to  the  mean  and 
variance  of  the  corresponding  system  which  contained  no  metamodel  components. 


Queue  Experiment 

Five  M/M/1  queues  with  identical  arrival  rates  (X=  16/hr)  and  service  rates 
(}X=18/hr)  were  simulated  for  28  days  of  operation.  The  simulation  statistics  were  cleared 
after  approximately  8  days  to  remove  any  bias  due  to  start-up  conditions.  Ten  replications 
were  conducted  for  each  queue.  The  mean  and  variance  of  these  ten  replications  for  each 
queue  are  shown  in  Table  3-1. 


Table  3-1  Queue  Simulation  Resul 


Queue 

Queue  Length 

Number 

Mean 

Variance 

1 

6.565 

0.238795 

2 

7.017 

0.261146 

3 

6.772 

0.247496 

4 

6.940 

0.213511 

5 

6.977 

0.278790 

ts 


The  mean  number  of  customers  in  a  tandem  queuing  system  consisting  of  five 
M/M/1  queues  could  be  estimated  by  the  sum  of  the  mean  queue  lengths  listed  in  Table  3- 
1.  Alternatively,  the  mean  number  of  customers  in  the  system  could  be  estimated  by  the 
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sum  of  simulation  and  metamodel  estimates.  The  metamodel  selected  to  represent  the 
mean  length  of  an  M/M/1  queue  was  previously  developed  by  Friedman  (1985). 

The  system  modeled  an  M/M/s  queue,  with  a  single  service  facility  and  a  single 
waiting  line;  demands  were  assumed  to  arrive  according  to  Poisson  process  with  a 
constant  average  arrival  rate  (ARR)  and  service  times  were  assumed  to  follow  an 
exponential  distribution  with  a  constant  average  service  time  (1/SVC).  The 
performance  measure  of  interest,  average  number  of  demands  waiting  for  service 
(LQ)  was  generated  at  the  end  of  each  15-week  run  (Friedman,  1985:145). 

The  form  of  their  metamodel  is: 

ln(Lfi)=2.9517+15.00991n(AR/?)-14.7682-ln(5VC)-14.8217-ln(V5VR)  (3.5) 


For  an  arrival  rate  of  A,=16/hour,  service  rate  of  p.=18  /  hour,  and  a  single  server,  the 
metamodel  yields  an  expected  queue  length  of  6.569,  as  opposed  to  the  expected  steady 
state  value  of  7.1 1 1  (Ross,  1993:360). 

To  examine  the  effect  of  replacing  a  stochastic  simulation  component  with  a 
deterministic  metamodel,  the  simulated  queuing  results  were  sequentially  replaced  by  the 
Friedman  metamodel  result  Thus,  the  first  metamodel  replaced  the  first  queuing 
simulation,  the  second  metamodel  replaced  the  second  simulation,  and  ultimately  the  fifth 
metamodel  replaced  the  last  remaining  simulation. 

The  results  of  the  metamodel  substitution  experiment  are  summarized  in  Table  3-2. 


Ta 

ble  3-2 

Metamodel  Substitution  Results 

Total  Lenffh  qfQteues 

Actual 

Actual 

M  5  oconponents  simulated 

mmK 

lOOOO 

3427 

3427 

lOOOO 

Replace  first  component  w/ MM 

HE] 

8073 

3428 

2AI1 

lOOOl 

Replace  first  and  second  components  w/ MN6 

a74 

■HE] 

3427 

0.49 

wmK 

39.68 

33.62 

3427 

9011 

028 

■■El 

■IS 

3325 

3427 

97.(B 

R^laoe  an  five  components  W  MN^ 

OOO 

HE! 

0.00 

3285 

3427 

95.84 

HUH 
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As  each  of  the  i  components  was  sequentially  replaced  by  a  metamodel,  the  variability  of 
the  entire  system  decreased.  In  the  extreme,  when  each  of  the  5  components  were 
replaced  by  a  metamodel,  the  variability  of  the  estimated  queue  length  was  zero.  Because 
the  metamodel  was  properly  specified  and  approximately  unbiased,  there  was  no 
significant  difference,  at  an  (x=  .10  level,  between  the  total  queue  length  of  the  original 
system  and  the  total  queue  lengths  predicted  by  any  of  the  simulations  that  contained  one 
or  more  metamodels. 

Error  Replacement 

Each  of  the  metamodel  substitutions  in  the  experiment  actually  produced  a  new  or 
modified  model  consisting  of  a  different  mix  of  simulation  and  metamodel  components. 
The  reduction  in  simulation  output  variance  due  to  metamodel  substitution  resulted  from 
ignoring  the  variance  of  the  embedded  metamodels.  The  variance  of  a  metamodel 
response  is  generally  not  the  same  as  the  variance  of  the  simulation  component  it  replaced. 
A  procedure  to  compensate  for  the  inappropriately  diminished  variance  of  the  simulation 
output  could  attempt  to  restore  the  variance  that  was  present  in  the  original  simulation. 

On  the  other  hand,  an  alternative  procedure  might  focus  on  estimating  the  variance  of  the 
new  or  modified  model.  The  question  of  which  approach  is  preferable  is  a  philosophical 
modeling  issue  that  has  yet  to  be  resolved. 

The  Mean  Square  Error  (MSE)  of  a  properly  specified  least-squares  metamodel  iS 
an  unbiased  estimator  of  the  variance  of  the  simulation  response  at  a  given  design  point 
In  contrast,  the  variance  of  a  least  squares  metamodel  response  is  a  function  of  the 
variances  of  the  regression  coefficients  and  the  covariances  between  pairs  of  regression 
coefficients  G^eter,  1990:244).  This  variance  is  given  by 

A 

5^  (Yh  )=MSEiX',  (X'X)-'  X, )  (3.6) 
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where  the  X  matrix  is  the  original  design  matrix,  and  the  Xh  vector  is  the  vector  of  the 
specific  point  about  which  the  variance  is  to  be  determined. 

Equation  3.6  is  the  variance  of  the  expected  mean  response  of  the  simulation  at  a 
specific  design  point,  as  opposed  to  the  variance  of  a  predicted  response  of  an  individual 
outcome  at  a  specific  design  point.  The  variance  of  a  predicted  response  of  an  individual 
outcome  at  a  specific  design  point  must  also  consider  the  variance  about  the  conditional 
mean.  Thus,  the  variance  of  a  predicted  response  of  an  individual  outcome  at  a  specific 
design  point  is  (Neter,  1990:246). 

)=MSE(l+X\  (X'X)-*  X, )  (3.7) 

Equation  3.7  represents  the  component  of  variance  that  the  metamodel  contributes 
to  the  simulation  output  for  an  individual  response.  For  r  individual  responses,  the 
variance  of  the  sample  mean  is  simply  the  estimated  population  variance  divided  by  r. 
Under  the  assumption  that  the  metamodel  provides  a  valid  measure  of  the  variance  of 
individual  outcomes,  the  variance  of  the  mean  response  of  r  replications  could  be 
estimated  by 


Component  Variation  = 


M5£(1+X;(X^)~'XJ 

r 


(3.8) 


Equation  3.8  could  be  used  to  adjust  the  precision  of  the  original  metamodel  to  more 
closely  correspond  to  that  of  the  replaced  simulation  component  In  this  study.  Equation 
3.8  was  used  to  compensate  for  the  inappropriate  variance  reduction  induced  by  ignoring 
the  variance  of  embedded  metamodels. 

Figure  3-3  depicts  the  results  of  the  metamodel  substitution  experiment  and  the  use 
of  Equation  3.8  to  compensate  for  the  inappropriate  reduction  in  the  variance  of  the 
simulation  output.  The  Xs  denote  the  loss  of  system  variability  as  first  one,  then  more  of 
each  of  the  queues  were  replaced  by  a  metamodel.  When  all  five  of  the  queues  were 


3-8 


replaced  with  metamodels,  the  system  variation  was  zero.  The  diamonds  denote  the 
variance  of  the  overall  system  present  when  the  variance  of  the  metamodel  response  was 
estimated  by  Equation  3.8  with  r  =  10  replications.  In  this  instance,  application  of 
Equation  3.8  appeared  to  adequately  account  for  the  inappropriate  reduction  in  the 
variance  of  the  simulation  output 


ideal  relation 

^  without  variance  insert  - 

^  with  variance  insert  ^r  "^^m 

Figure  3-3  Percent  Variation  Remaining  with  and  without  Variance  Reinsertion 

Thus,  for  a  simple  set  of  tandem  queues,  the  variance  of  the  simulation  response 
was  diminished  by  replacing  stochastic  simulation  components  with  deterministic 
metamodels.  Furthermore,  a  postulated  method  of  accounting  for  the  inappropriate 
variance  reduction  restored  the  output  variance  to  approximately  the  same  level  as  the 
original  simulation.  Since,  for  a  given  design  point,  the  metamodels  closely  approximate 
the  simulation  mean,  the  use  of  metamodels  did  not  significantly  affect  the  mean  of  the 
tandem  queuing  system,  as  shown  in  Table  3-2. 
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This  experiment  showed  that  for  a  simple  singly  sampled  embedded  metamodel  it 
was  possible  to  compensate  for  the  variance  diminished  by  the  use  of  metamodels.  This 
example  dealt  with  the  use  of  a  polynomial  metamodel  and  relied  on  the  linear  model 
assumption  of  normally  distributed  residuals.  The  next  chapter  generalizes  this  approach 
to  the  case  of  repeatedly  sampling  an  embedded  metamodel. 
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rv»  Repeatedly  Sampling  an  Embedded  Metamodel 


Introduction 

It  has  been  established  that,  in  some  cases,  an  embedded  metamodel  reduces  the 
variability  of  the  overall  simulation  model,  and  that  there  is  at  least  one  method  of 
compensating  for  that  diminished  variability.  This  chapter  extends  the  model  established 
in  Chapter  m  to  the  case  of  repeatedly  sampling  an  embedded  metamodel.  The  results  of 
an  experiment  show  that  the  variability  of  the  parent  simulation  is  again,  inappropriately 
reduced,  and  that  the  methods  developed  in  Chapter  HI  can  be  adapted  to  restore  the 
variance  of  the  parent  simulation  to  approximately  the  same  level  of  variation  originally 
present. 

Background  of  Model 

The  model  needed  for  this  type  of  experiment  was  fairly  specific:  a  stochastic 
simulation  with  several  submodules,  each  of  which  must  have  a  quantifiable  mean  and 
variance.  The  output  of  each  of  these  submodules  must  feed  into  an  overall  parent 
simulation  which,  again,  must  have  both  a  quantifiable  mean  and  variance 

The  simulation  created  to  meet  these  requirements  dealt  with  the  scenario  of  a  car 
dealer  who  buys  four  cars,  the  same  type  of  car  (e.g.  a  corvette),  every  day,  from  four 
different  cities  located  near  each  other.  The  buyer  visits  these  cities  and  buys  each  car  at  a 
price  which  depends  on  six  characteristics:  miles,  age,  number  of  wrecks,  appearance, 
rust,  and  color.  The  buyer’s  estimate  of  the  price  of  the  car  based  on  these  six  factors  is 
assumed  to  be  the  ‘true,’  or  simulated,  price  of  the  car.  It  is  assumed  that  no  other  factors 
affect  the  price  of  the  car.  The  simulation  variable  of  interest  is  the  total  cost  of  buying 
the  four  cars  each  day. 


4-1 


The  dealer  considered  purchasing  cars  over  the  phone  and  wished  to  build  a 
metamodel  to  estimate  car  prices  based  on  the  most  important  input  factors.  It  was 
assumed  that  data  would  be  collected  over  some  time  period  (200  days)  and  that  this  data 
would  be  used  to  create  a  metamodel  for  use  in  the  future.  The  data  for  these  200  days 
was  created  by  generating  the  information  for  each  of  the  six  input  variables.  Equation  4.1 
was  used  to  generate  the  ‘true’  prices  of  the  cars. 

y  =  34000  -  800  ■  Age  +  72  •  Appearance  —  20  •  Color  -  0.03839  •  Miles 
-  367  •  Rust  - 100  •  Wrecks 

A  random  component,  distributed  N(0,o^),  was  added  Equation  4.1  to  create  a  specific 
level  of  variance  in  the  data  base.  The  variance,  o^,  in  the  random  component  was  set  to 
three  different  levels  to  examine  the  effect  of  repeatedly  sampling  a  metamodel  in 
situations  ranging  from  low  to  high  variation.  Further  details  of  how  the  data  base  was 
generated  are  included  in  Appendix  A. 

Experiment 

There  were  three  scenarios  for  this  experiment,  each  with  a  different  level  of 
variance  in  the  simulation  response.  The  ‘low,’  ‘medium,’  and  ‘high’  scenarios  had 
standard  deviations  of  100, 1500,  and  4000  dollars  respectively,  in  the  random  component 
added  to  Equation  4.1.  In  each  of  these  scenarios,  a  metamodel  was  created  from  200 
days  of  data.  Only  terms  with  a  p- value  of  0.10  or  less  were  included  in  these  models, 
which  are  presented  in  Appendix  B. 

In  each  of  the  three  scenarios,  50  ‘days’  of  operation  beyond  the  data  collection 
period  were  simulated  for  each  city.  All  the  cities  were  assumed  to  be  the  same  in  every 
way,  and  each  represented  a  submodule  of  the  larger  simulation.  The  output  for  a  single 
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day’s  operation  was  the  total  cost  of  purchasing  a  car  in  each  of  the  four  cities.  The 
output  for  the  entire  simulation  of  50  days  was  the  mean  daily  car  buying  cost  incurred  by 
the  dealer.  Figure  4-1  shows  the  model  of  the  simulation. 


Figure  4- 1 :  Diagram  of  Simulation 

Each  city  submodule  was  sampled  once  for  each  day  of  the  simulation.  When  an 
embedded  metamodel  replaced  one  of  the  simulation  components,  it  was  also  sampled 
once  for  every  day  of  the  simulation.  Thus,  this  experiment  illustrated  the  concept  of 
repeatedly  sampling  an  embedded  metamodel. 

First,  the  mean  and  variance  of  the  daily  car  buying  experiment  for  the  completely 
simulated  original  system  were  calculated.  The  mean  and  variance  for  the  original  system 
were  then  compared  to  those  of  the  models  containing  one  or  more  metamodels.  In 
agreement  with  the  results  from  Chapter  HI,  the  mean  was  nearly  constant  in  all  3 
scenarios  (low,  medium,  high),  but  the  variance  of  the  simulation  output  was 
inappropriately  diminished  in  both  the  medium  and  high  scenarios.  In  the  low  scenario, 
with  little  variation  present  initially,  there  was  no  significant  loss  in  the  output  variance. 
Table  4-1  and  Figure  4-2  show  this  reduction  in  the  output  variance  as  first  one,  then 
another,  of  the  metamodels  replaced  the  simulation  components.  The  values  depicted  in 
Table  4-1  and  Figure  4-2  indicate  the  percentage  of  the  original  output  variance  present 
for  each  scenario. 
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Table  4-1  Queue  Simulation  Results  for  Embedded  Metamodels 

Scenario 


Low 

Medium 

High 

Original  System 

100 

100 

100 

1  Metamodel  Used 

100 

98 

95.6 

2  Metamodels  Used 

99.8 

88.4 

86.7 

3  Metamodels  Used 

100 

86.7 

75.5 

4  Metamodels  Used 

100 

83.8 

60.9 

(Values  shown  are  percent  of  original  output  variance) 


Error  Replacement 

Having  demonstrated  the  potential  for  an  inappropriate  loss  of  output  variance 
when  a  stochastic  simulation  model  repeatedly  samples  a  deterministic  metamodel,  the 
problem  of  compensating  for  the  diminished  output  variance  remains.  To  account  for  the. 
inappropriate  variance  reduction,  the  techniques  described  in  Chapter  in  were  reapplied  to 
the  current  problem  with  two  significant  differences.  Since  the  queues  examined  in 
Chapter  IH  were  identical.  Equation  (3.8) 
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Component  Variation 


M^E-q+xjl’-cx^-Xj-'-xj 


(3.8) 


r 

required,  only  one  Xh  vector  or  design  point.  In  the  current  problem,  however,  every  car 
purchase  represented  a  unique  design  point  and  corresponding  Xh  vector.  In  addition, 
there  were  10  replications  used  in  Chapter  IH,  so  r  was  equal  to  10.  In  this  problem,  there 
were  no  replications,  and  r  was  equal  to  one. 

An  even  more  fundamental  difference  between  the  two  problems  lies  in  the  method 
of  compensating  for  the  inappropriate  reduction  of  the  output  variance.  In  Chapter  III, 
the  variance  of  the  mean  queue  length  was  calculated  once  for  each  queue.  The  variance 
of  the  mean  queue  length  for  the  entire  system  was  simply  the  sum  of  those  variances.  In 
the  current  repeated  sampling  problem,  a  different  approach  was  required.  The  diminished 
variance  had  to  be  compensated  for  every  time  the  metamodel  was  sampled.  Assuming  an 
adequate  linear  polynomial  metamodel,  the  distribution  of  any  predicted  individual 
response  would  be  approximately  normally  distributed  with  a  mean  of  zero  and  a  variance, 
o^,  given  by  Equation  3.8.  Thus,  one  possible  method  of  compensating  for  the 
inappropriate  variance  reduction  in  the  repeated  sampling  case  would  be  to  generate  a 
N(0,CT^)  random  variate  and  add  it  to  the  metamodel  response  every  time  the  metamodel 
is  sampled.  This  modified  metamodel  response  is  given  by 

y'  =  /(x)  +  V(0,al)  (4.2) 

where  f(x)  is  the  metamodel  and  N(0,o *)  represents  a  random  sample  from  the  specified 
Normal  distribution. 

When  the  modified  metamodels  were  applied  to  each  of  the  three  scenarios  (low, 
medium,  high),  the  general  results  of  Chapter  III  were  again  duplicated.  In  both  of  the 
medium  and  high  variance  scenarios  the  modified  metamodel  successfully  compensated  for 
the  inappropriate  reduction  in  the  output  variance.  There  was  no  variance  reduction 
problem  in  the  low-variance  scenario,  and  the  modified  metamodel  had  virtually  no  effect 
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on  the  statistical  quality  of  its  output.  The  results  of  this  experiment  are  summarized  in 
Figure  4-3  through  4-5. 

Figure  4-3,  shows  that  when  the  original  component  has  low  variability,  or  is 
nearly  deterministic,  replacing  that  component  with  a  truly  deterministic  metamodel  has 
little  impact  on  the  output  variance  of  the  parent  simulation.  In  this  case,  use  of  the 
modified  metamodel  neither  improved  nor  degraded  the  of  output  variance.  In  Figure  4-4, 
the  expected  effect  of  replacing  stochastic  subcomponents  with  deterministic  components 
begins  to  emerge.  The  level  of  variance  of  the  parent  simulation  was  approximately 
restored  to  its  original  level  through  the  use  of  the  modified  metamodel.  Finally,  in  Figure 
4-5,  the  effect  on  the  variance  of  the  simulation  output  when  stochastic  subcomponents 
are  replaced  with  deterministic  metamodels  is  most  readily  apparent  Again,  the  variance 
of  the  simulation  output  was  approximately  restored  to  its  original  level  by  the  variance 
compensation  actions. 

(sigma  of  noise=100)  (R^  of  regression  =.99947) 

1.5 

1 

%  variation 
present 

0.5 

0 

0  '  2  4 

^  no  error  compensation 
^  with  error  compensation 

Number  of  metamodels  replaced  out  of  4  components 
Figure  4-3  Low  Variance  Scenario 
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(sigma  of  noise=1500)  (R?  of  regression  =.7555) 
1.5 

1 

%  variation 
present 

0.5 


0 

0  2  4 

^  no  error  compensation 
^  with  error  compensation 

Number  of  metamodels  replaced  out  of  4  components 

Figure  4-4  Medium  Variance  Scenario 


(sigma  of  noLse=4000)  (R^  of  regression  =,38875) 

1.5 


1 

%  variation 
present 

0.5 


0 

0  2  4 

^  no  error  compensation 
^  with  error  compensation 

Number  of  metamodels  replaced  out  of  4  components 

Figure  4-5  High  Variance  Scenaiio 
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Conclusion 

The  inappropriate  reduction  in  the  variance  of  the  simulation  output  when 
stochastic  components  were  replaced  with  deterministic  metamodels  seemed  to  apply  to 
repeated  sampling,  as  it  did  for  the  non-repeated  sampling  of  Chapter  HI.  When  there  was 
a  significant  level  of  variation  initially  present  in  a  simulation  component,  replacing  that 
component  with  a  metamodel  inappropriately  diminished  the  variation  of  the  simulation 
response.  However,  by  compensating  for  this  diminished  error,  the  variance  of  the 
simulation  could  be  restored  to  approximately  its  original  level.  The  general  solution 
method  introduced  in  Chapter  HI,  to  compensate  for  this  diminished  variance  also 
appeared  to  be  effective  for  the  repeated  sampling  case.  In  the  scenario  containing  low 
variance  initially,  the  variance  compensation  method  appeared  to  neither  improve  nor 
degrade  the  level  of  the  variance  present 
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V.  Metamodels  and  Distributions 


Introduction 

In  Chapters  IE  and  IV  the  focus  was  on  the  variance  of  the  simulation  output  and 
when  it  might  be  inappropriately  diminished  due  to  a  metamodel  substitution.  However,  a 
discussion  of  embedded  metamodels  would  be  incomplete  without  discussing  when  an 
embedded  metamodel  is  not  appropriate  at  all.  In  some  situations  simply  sampling  a 
random  number  from  a  known  distribution  is  more  appropriate  than  inserting  a 
metamodel. 

Distributions  vs  Metamodels 

A  response  surface  approximates  a  mechanistic,  or  theoretical  model.  If  there  is  a 
fairly  well-defined  output  measure  for  a  given  set  of  inputs,  a  response  surface  or 
metamodel  would  be  appropriate.  Although  the  randomness  in  observed  or  simulated  data 
creates  a  degree  of  uncertainty  in  the  exact  value  of  the  expected  output  of  any  individual 
observation,  it  is  possible  that  this  random  variation  could  be  relatively  small,  if  not  nearly 
zero.  An  example  of  such  a  mechanistic  model  is  Einstein’s  famous  equation 

E=m^  (5.1) 

which  states  that  a  given  amount  of  mass,  m,  will  yield  a  specific  amount  of  energy,  E, 
when  the  mass  is  converted  to  energy. 

On  the  other  hand,  some  outputs  conform  more  to  a  probability  distribution  than 
to  a  mechanistic  model.  For  example,  the  time  to  failure  for  a  light  bulb  is  described  fairly 
well  by  an  exponential  distribution.  It  is  not  defined  by  a  mechanistic  input/output 
relationship.  Thus,  it  is  better  modeled  by  sampling  from  a  probability  distribution. 

The  case  mentioned  in  Chapter  I  of  a  theater-level  combat  model  calling  a  lower- 
level  engagement  model  to  resolve  a  one-on-one  conflict  between  two  aircraft  is  not  so 
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clear.  Using  a  metamodel  as  a  surrogate  for  an  engagement  model  implies  an  underlying 
model  that  yields  credible  results  (e.g.,  an  integer  number  of  survivors).  Without  an 
imderlying  a  mechanistic  model,  either  a  theoretical  or  empirical  distribution  might  be  a  far 
more  appropriate  representation. 

Conclusion 

When  deciding  on  whether  to  use  an  embedded  metamodel,  it  is  important  to 
consider  whether  there  is  a  well  defined  mechanistic  input/output  relationship  that 
warrants  the  use  of  a  metamodel.  If  there  is  not  a  well  defined  mechanistic  input/output 
relationship,  then  perhaps  it  is  more  appropriate  to  sample  from  a  probability  distribution, 
or  in  some  cases,  simply  use  the  simulation  subcomponent  the  metamodel  was  being 
evaluated  to  replace. 
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VI,  Conclusions  and  Recommendations 


Introduction 

As  stated  previously,  the  purpose  of  this  research  was  to  1.)  provide  insight  into 
how  embedded  metaraodels  affect  the  output  of  a  ‘parent’  simulation,  particularly  with 
regard  to  repeatedly  sampling  an  embedded  metamodel,  and  2.)  suggest  guidelines  for 
their  use.  Conclusions  from  preceding  chapters  are  formalized  here.  In  addition, 
recommendations  for  additional  research  are  presented. 

Conclusions 

The  theoretical  framework  for  the  components  of  the  system  variance  discussed  in 
Chapter  El  appeared  to  correctly  predict  the  effect  of  a  metamodel  substitution  for  singly 
sampled  metamodels,  namely  an  inappropriate  reduction  in  the  system  variation.  When 
this  effect  was  tested,  the  empirical  results  were  consistent  with  the  conjecture  of  an 
inappropriate  reduction  in  the  variance.  Using  the  same  theoretical  framework  for  the 
components  of  the  system  variance  and  exploiting  the  properties  of  the  linear  model,  the 
system  variance  was  returned  to  approximately  its  original  level,  for  both  the  singly  and 
repeatedly  sampled  embedded  metamodel,  further  validating  the  theoretical  framework  for 
the  components  of  the  system  variance.  Finally,  metamodels  are  not  always  appropriate 
substitutes  for  simulation  components.  Some  simulation  components  are  better 
represented  by  distributions  of  outcomes  rather  than  empirical  models. 

Recommendations 

The  conjecture  that  different  types  of  simulations  can  be  represented  by  either  a 
metamodel  or  a  sample  from  a  known  probability  distribution  needs  to  be  further  explored 
and  defined.  The  techniques  described  in  Chapters  HI  and  IV  for  compensating  for  the 
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missing  level  of  variance  need  to  be  extended  and  tested  on  larger  simulations  to 
determine  if  these  results  are  also  consistent  and  to  validate  the  theoretical  framework  for 
the  components  of  the  system  variance.  Lastly,  this  thesis  was  limited  to  linear  polynomial 
metamodels.  Other  types  of  metamodels  should  be  examined  for  the  possibility  of 
compensating  for  inappropriately  diminished  variance. 
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Appendix  A:  Data  Base  Generation 


The  database  for  this  research  was  created  in  EXCEL.  The  200  random  numbers 
for  each  input  variable  (age,  miles,  number  of  wrecks,  appearance,  rust,  and  color)  were 
created  using  the  random  number  generation.  The  200  random  numbers  were  created 
from  the  following  distributions: 

1.  Age  =  Uniform(0,7)  years 

2.  Appearance  =  Binomial  (p=.2)  15  trials 

3.  Number  of  Wrecks  =  Binomial  (p=.l)  10  trials 

4.  Color  =  Uniform  (1,10) 

5.  Rust  =  Binomial  (p=.l)  8  trials 

Miles  needed  to  be  ‘loosely’  tied  to  age.  A  random  number  was  drawn  from  a 
Normal  (15, 5)  distribution  where  the  units  are  thousands  of  miles.  This  was  then 
multiplied  by  age  to  get  miles. 

The  six  factors  were  related  to  the  price  of  the  car  by 
y=34(KX)-S00-Age+12-Appearance-20'Color-0.03S9Miles-361Rust-l00-Wrecks 
(4.1) 

To  provide  and  control  the  variance  of  the  model,  a  column  of  ‘noise’  was  created 
and  added  to  Equation  4-1.  This  noise  was  set  to  3  different  levels  corresponding  to  each 
of  the  3  scenarios.  Part  of  the  database  follows. 
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First  50  of  a  Set  of  200:  Shows  Medium  Case  of  Variation 


N(0.1500) 

197.88 

1352.60 


U(15K,5K) 

15426’ 

18298 


347.74 


-32 


1798.47 


-3092.38 


-967.19 


176.49 


-412.44 

1991.58 

'187.35 

1086.14 

-2373.79 

'1151.61 

-2001.88 

-286.13 


20917 

7327 

20902 

8039 

6528 

18638 

27612 

13904 

6307 

14110 

10182 

16118 


Miles 
28035 ' 
92099 


3 

133.57 

10335 

23077 

4 

79.04 

17377 

731 

5 

419.47 

12349 

62249 

-859.72 

23149 

7 

827.33 

12167 

8 

1226.72 

9 

1614.60 

18836 

10 

4210.05 

14583 

101677 

11 

-1216.81 

14523 

90305 

12 

-814.00 

4765 

10665 

1503.53 

1 

-1361.41 

1684.56 

1 

-657.61 

1 

2329.10 

2 

Age 

1.82 

5.03 

2.23 

0.04 

5.04 


#  Wrecks  Appearan 


1683.91 


1030 

0.08 

101077 

5.77 

14811 

1.23 

94634 

4.52 

57769 

3.33 

47309 

2.26 

10004 

0.55 

57392 

3.47 

125938 

6.76 

112086 

4.06 

60887 

4.38 

19881 

3.15 

69105 

4.90 

3.18 

mmgn 

1.25 

0.18 

2.42 

4.76 

4.62 

4.75 

3.67 

3.04 

IKEE&E] 

2.88 

116455 

5.07 

Paid-1 

31203 

27125 

31554 

34116 

28041 


Appendix  B;  Metamodels  for  Chapter  IV 


This  appendix  presents  the  fitted  metamodel  and  corresponding  ANOVA  table  for  each  of 
the  three  scenarios  (low,  medium,  and  high  variance).  All  metamodels  were  developed 
using  terms  with  p- values  of  significance  of  not  greater  than  ,10.  Also,  all  metamodels 
have  p- values  for  overall  significance  of  the  model  of  not  greater  than  1  E-17. 


Low  Variance  Scenario:  (o^noise=100  dollars)  (R^  of  Equation  =  .9993) 

y  =  34023.13  -  0.03904  •  miles  -  794.555  •  age  - 109.871  •  wrecks  +  73.591  •  appearance  -  367.04  - 
-  20.828  •  color 


Multiple  R 

0.999467035 

0.998934353 

0.998901224 

Standard  Error 

98.20757802 

Observadons 

200 

Sum  of 

Mean 

Significance 

df 

F 

of  F 

6 

1744901549 

290816924.8 

30152.94089 

7.001 3E-284 

Residual 

193 

1861432.577 

9644.728381 

Total 

199 

1746762981 

Coefficients 

Std  Error 

t  Statistic 

P-value 

34023.12747 

0 

iriles 

-0.03903782 

0.000368809 

-105.848328 

6.5876B-177 

age 

-794.555046 

6.738454719 

-117.913539 

4.3214E-186 

"1 

wrecks 

-109.871007 

7.451914156 

-14.743998 

9.70993E-34 

73.591407 

4.673888194 

rust 

-367.036552 

9.151668955 

-40.1059691 

2.71574E-97 

color 

-20.8274553 

2.339816596 

B-1 


(o^noUe=1400  dollars)  (R^  of  Equation  =  .7555) 


y  =  33984.00  -  0.03807  •  miles  -  748.549  •  age  -  474.439  •  rust 


0.869204378 

0.755516251 

0.751774152 

Standard  Error 

1619.527333 

Observations 

200 

Analysis  of  Variance 

Sum  of 

Mean 

Significance 

df 

Squares 

Square 

F 

of  F 

3 

1588643493 

529547831.1 

201.8964254 

1.09158E-59 

Residual 

196 

514082281.2 

2622868.781 

Total 

199 

2102725775 

Coefficients 

Std  Error 

t  Statistic 

P- value 

Intercept 

33984.00072 

257.3527148 

132.0522332 

9.3657E-196 

iriles 

-0.03806599 

0.006061683 

-6.27977248 

2.08872E-09 

age 

-748.54866 

110.1077442 

-6.79832891 

1.20514E-10 

rust 

-474.439147 

146.2709828 

-3.243563 

0.001384438 

High  Variance  Scenario:  (o^noise=4000  dollars)  (R^  of  Equation  =  .3888) 
y  =  36O65M-0.05915rmles-5Q831hage-M0A12-rust 


Regression  Statistics 

0.623495361 

R  Square 

0.388746465 

0.369743765 

Standard  Error 

4115.28689 

I  Observations 

200 

Analysis  of  Variance 

Sum  of 

Mean 

Significance 

df 

Squares 

Square 

F 

of  F 

6 

2078751674 

346458612.3 

20.45743256 

1.75837E-18 

Residual 

193 

3268568135 

16935586.19 

Total 

199 

5347319809 

Coefficients 

Std  Error 

t  Statistic 

P-value 

Intercept 

36065.98138 

1073.064047 

33.61027842 

5.53761E-84 

miles 

-0.05975387 

0.015454561 

-3.86642286 

0.000149508 

age 

-508.371356 

282.3679692 

-1.80038606 

0.073314067 

wrecks 

-480.637942 

312.2647483 

-1.53920013 

0.125343999 

-250.001362 

195.8544463 

-1.27646508 

0.203278791 

rust 

-840.411605 

383.4912136 

-2.19147551 

0.029578142 

color 

-123.495297 

98.04759221 
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